CLAIMS 



What is claimed is: 

1 . In a computer system where forecasting of computing resources is performed based on past 
and present observations of measurements related to said resources, a method for preprocessing 
5 including decomposing said past and present observations into a smooth time sequence, a jump 
time sequence, a noise time sequence and a spike time sequence, the method comprising: 

detecting the spikes in a signal representing said measurements; 

detecting the jumps in said signal; 

hi 

!£( removing spikes and jumps from said signal; and 

1 0 Lii removing the noise from the signal, to obtain a smooth version of the signal. 

' B * 2. The method of claim 1 , wherein the removing of the noise comprises: 
r 2 estimating the variance of the noise. 

3. The method of claim 2, further comprising: 

1 5 estimating the variance of the noise prior to said detecting of said spikes. 

4. The method of claim 1 , further comprising: 

estimating the variance of the noise prior to said detecting of said spikes. 
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5. The method of claim 1, wherein estimates of the quantities necessary to decompose the 
sequence are performed by first applying an invertible transform to the data associated with said 
observations. 

6. The method of claim 5, wherein said transform comprises a discrete wavelet transform 
(DWT). 

7. The method of claim 5, wherein said transform comprises a discrete fast Fourier transform, 

8* The method of claim 6, further comprising estimating a variance of the noise using the 
coefficients of a highest frequency subband of the wavelet transform. 

9. The method of claim 8, wherein the spikes are detected as up-and-down and down-and-up 
local variations, said spikes being judged not to be noise based on a result of a statistical test 
based on said estimating of the noise variance. 

10. The method of claim 1, wherein said forecasting is for capacity management. 

1 1 . The method of claim 1 , wherein said forecasting is for software rejuvenation. 

12. The method of claim 1, wherein said forecasting is performed by estimating a trend of the 
smooth time sequence. 
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13. The method of claim 1 , wherein seasonal components are removed from the smooth time 
sequence. 

14. The method of claim 1 , wherein seasonal components are separately removed from the 
smooth time sequence, the noise sequence, the spike sequence and the jump sequence. 

15. The method of claim 1, wherein said forecasting is performed on each of said smooth time 
sequence, said jump time sequence, said noise time sequence, and said spike time sequence. 

16. The method of claim 1, wherein said forecasting is performed on statistics of the jump time 
sequence, the noise time sequence and the spike time sequence. 

17. The method of claim 1, wherein preprocessed data is based on the smooth version of the 
signal input to a forecasting algorithm for analyzing the preprocessed data, to produce a 
prediction. 

1 8. The method of claim 1 , wherein the signal is denoted by x, its value at time i by x(i) y a 
sequence of values by {x(t)} 9 and a part of a sequence between times / and k by {x(i),...,x(k)} 7 and 

wherein the noise represents unpredictable localized variations of the signal and is 
denoted by z, the spikes represent unpredictable localized up-and-down or down-and-up 
variations not judged to be noise, denoted by p, and the jumps are a zero-mean component, which 
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is a piecewise constant other than at discontinuity points where the signal changes either upwards 
or downwards, but not in both directions, denoted by j, and the smooth time sequence is a 
difference between the signal and a sum of noise, spikes and jumps, denoted by y such that 

(x(t)} - {z(t)+p(t)+j(t)+y(t)l 

19. The method of claim 2, wherein the estimating of the variance of the noise includes: 

computing a Discrete Wavelet Transform (DWT) of the signal; 

extracting from the DWT a predetermined frequency subband depending upon a format of 
the DWT; and 

producing an initial noise variance based on said predetermined frequency subband. 

20. The method of claim 19, further comprising correcting an initial estimate of the variance 
using the spike and the jump series extracted during a previous decomposition. 

21 . The method of claim 1, wherein said removing of said spikes comprises: 

producing a Discrete Wavelet Transform (DWT) of the signal; 
extracting a predetermined frequency band of the DWT; and 

identifying candidate locations of spikes based on a predetermined frequency band and a 
noise variance estimate. 

22. The method of claim 21, wherein the identifying of candidate locations includes thresholding 
the predetermined frequency band with a threshold that depends on the estimate of the noise 
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variance, such that each element of the predetermined frequency band having an absolute value 
larger than the threshold is declared a candidate location, and 

wherein the threshold is selected to discard the noise and to retain values not statistically 
defined by a noise process. 

23 . The method of claim 2 1 , further comprising: 

identifying actual individual spikes, wherein said identifying actual individual spikes 
comprises: 

taking as inputs the signal and the estimated variance of the noise; and 
letting subscript / denote a location of a candidate spike and assuming that a spike of 
width w at location / is a sequence of (w+1) samples, x^, x„ x,-+i, such that a difference di = (x,-r 
Xi+0 is a sum variation in a smooth signal plus additive noise, and that the absolute value of a 
difference d 2 = [ xj- (x,m + Xi +J )/2 ] is too large to be explained by a same model, computing the 
differences di and d 2 , and comparing the differences to two thresholds Ti and T 2 , 

wherein if an absolute value of di is less than Ti and the absolute value of d 2 is larger 
than T 2 , then the candidate spike is declared to be a spike. 

24. The method of claim 1 , wherein said detecting of said jumps comprises: 

computing a Discrete Wavelet Transform (DWT) of the signal and producing a sequence 
of details obtained while computing the DWT, such that the details have a same length as the 
signal, wherein a collection of first k details plus the signal forms a (£+7>leveI multi-resolution 
pyramid; 
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• iterating over the (£+7) levels of the pyramid, such that at each said level a windowed 
linear regression is computed to produce a time series of slopes, wherein a time series of slopes is 
computed for the signal and a time series of slopes is computed for each of the details, such that 
a total of (k+1) time series are produced, 

wherein said windowed linear regression uses a sliding window of width w, which takes 
as an input a time series (x(l),... t x(n)} and produces n-w+i subseries (xflX.^xfw)}, 
{x(2),... l x(w+l)},... > (x(n-w+l),...,x(n)} and a linear regression is computed for each of the 
subseries, by fitting a straight line segment to the data using a least squares criterion, and a slope 
sequence (s(l) 9 ... % s(n-w+l)} is generated. 

25. The method of claim 24, wherein by applying windowing and segment fitting to each of the 
details a k+1 slope series of length is generated, 

said detecting of the jumps further comprising: 

after all of the slope series are computed, computing a product of the slopes, wherein 

the jumps appear as externa of the slopes at all the levels of the multi-resolution pyramid; 
extracting local extrema and producing a list of candidate jump locations; and 
removing extrema having a predetermined small absolute value and retaining extrema 

with a predetermined large absolute value, which are declared jumps. 

26. The method of claim 1 , wherein said noise removal comprises: 

subtracting the spike series and the jump series from the signal, thereby producing a 
sequence which contains a sum of the smooth signal and the noise sequence; 
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■ based on the sequence, estimating a variance of the noise; 

computing an energy of the sequence, by computing a sum of the squares of the values; 

removing the noise from the sequence, and using the estimate of the noise variance by 
computing a wavelet transform of the sequence, wherein coefficients of the transform are soft- 
thresholded using a threshold T based on the estimate of the noise variance, and 

wherein the soft-thresholding with threshold T applied to a value x is the operator S(x, 7) 
defined as: 



S(x,7) = x-r ifjc> r, 

S{x,T) = x + T ifx<-r, (1) 

S(*,7) = 0 otherwise, 

wherein the operation (1) is applied to all values of residual and detail sequences of the 
transform, and a thresholded signal is obtained; 

restoring the energy by taking as inputs the thresholded signal and the energy of the 
signal, and dividing the energy by the energy of the thresholded signal, taking the square root, 
and multiplying the result by the thresholded signal, to produce the smooth signal having a same 
energy as a signal having the smooth and noise components; and 

obtaining the noise by subtracting the smooth signal from the signal having the smooth 
and noise components. 

27. The method of claim 1, where at least one of said computing resources is exhaustible. 
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28. »A computer system, comprising: 

a forecast module for forecasting computing resources based on observations of 
measurements related to said resources; and 

a data preprocessing module for decomposing said observations into a smooth time 
sequence, a jump time sequence, a noise time sequence and a spike time sequence, wherein said 
data preprocessing module comprises: 

an estimator for estimating a variance of the noise; 

a detector for detecting the spikes and jumps in said estimated variance of the noise; 

and 

a subtractor for subtracting spikes and jumps from the signal, said estimator further 
estimating the variance of the noise of said signal based on said spikes and jumps having been 
removed, and said subtractor removing the noise, to obtain a smooth version of the signal. 

29. A signal-bearing medium tangibly embodying a program of machine-readable instructions 
executable by a digital processing apparatus to perform a method for preprocessing data to used 
for forecasting of computing resources based on observations of measurements related to said 
resources, said method for preprocessing including: 

decomposing said observations into a smooth time sequence, a jump time sequence, a 
noise time sequence and a spike time sequence, said decomposing comprising: 
detecting the spikes in a signal representing said measurements; 
detecting the jumps in said signal; 
removing spikes and jumps from said signal; and 
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removing the noise from the signal, to obtain a smooth version of the signal. 
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